Speech processing
Mohammad Reza Yazdchi; Seyed Ali Seyed Salehi
Volume 1, Issue 3 , June 2007, , Pages 201-213
Abstract
One of the most important challenges in automatic speech recognition is in the case of difference between the training and testing data. To decrease this difference, the conventional methods try to enhance the speech or use the statistical model adaptation. Training the model in different situations ...
Read More
One of the most important challenges in automatic speech recognition is in the case of difference between the training and testing data. To decrease this difference, the conventional methods try to enhance the speech or use the statistical model adaptation. Training the model in different situations is another example of these methods. The success rate in these methods compared to those of cognitive and recognition systems of human beings seems too much primary. In this paper, an inspiration from human beings' recognition system helped us in developing and implementing a new connectionist lexical model. Integration of imputation and classification in a single NN for ASR with missing data was investigated. This can be considered as a variant of multi-task learning because we train the imputation and classification tasks in parallel fashion. Cascading of this model and the acoustic model corrects the sequence of the mined phonemes from the acoustic model to the desirable sequence. This approach was implemented on 400 isolated words of TFARSDAT Database (Actual telephone database). In the best case, the phoneme recognition correction increased in 16.9 percent. Incorporating prior knowledge (high level knowledge) in acoustic-phonetic information (lower level) can improve the recognition. By cascading the lexical model and the acoustic model, the feature parameters were corrected based on the inversion techniques in the neural networks. Speech enhancement by this method had a remarkable effect in the mismatch between the training and testing data. Efficiency of the lexical model and speech enhancement was observed by improving the phonemes' recognition correction in 18 percent compared to the acoustic model.